A method to synthesize Arabic from short phonetic

نویسنده

  • Yousif A. El-Imam
چکیده

A system that uses short phonetic clusters, speech segments, or synthesis units to synthesize standard Arabic (SA) is described. The clusters are derived from the Arabic syllables. Basic and phonetic variants of the synthesis units are defined after qualitative and quantitative analyses of the language phonetics. A speech database of the synthesis units and their phonetic variations is created and the units are tested to control their segmental quality. A computer-based TTS system is developed using the method. Speech is synthesized by waveform concatenation. The intelligibility of synthesized speech is assessed by a standard intelligibility test method that is adapted to suit the Arabic phonetic characteristics. Introduction Concatenating units extracted from speech utterances is a popular method for segmental speech synthesis from discrete speech segments. The units are usually sub-words of fixed contexts and durations i.e. units derived from a given phonetic context, and having fixed lengths. Typical units used in the past are diphones, triphones, and demisyllables. For example, the polyphone approach (Bigorne, et al 1991) used on the multilingual PSOLA. Small sized units of speech are prone to a great deal of variations in phonetic quality depending on the context in which they appear. It is often important to conduct an objective study of the contextual phonetic variations of the units and enhance the basic units with some phonetic variants and perform waveform processing to synthesize speech of good quality. Arabic has been synthesized using different types of fixed-length discrete synthesis units (El-Imam, 1990). The present method is yet an addition to the pool of synthesis algorithms for Arabic speech. It differs from the previous method in that it requires less number of synthesis units. The basic synthesis units are enhanced by few allophonic variants after analysis of the phonetic of Arabic. The allophones of the synthesis units are shown to be viable phonetic variants by perceptual and quantitative analysis. After segmentation, the speech database is tested and modified to improve its quality and that of the synthesized speech. Intelligibility tests were conducted on synthesized speech. 1-The Arabic sounds and their phonetic variations In Arabic there are 28 consonant, 6 vowels, and two diphthongs. The consonants are (/?/. /b/, /t/, / /, /dz/, /H/, /x/, /d/, /δ/, /r/, /z/, /s/, /∫/, /S/, /D/, /T/, /Z/, /E/, /R/, /f/, /q/, /k/, /l/, /m/, /n/, /h/, /w/, and /j/). The Arabic equivalent of the above symbols are respectively, ( , , , , , , , , , , , , , , , , , , , , , , , , , , , and ). The vowels are three short (/a/, /u/, and /i/) and three long, which are the counterparts of the short (/a:/, /u:/, and /i:/). The diphthongs are (/aj/ and /aw/; formed when the glides /j/ or /w/ are preceded by short vowel /a/). The articulation of these basic sounds was presented in El-Imam, 1990. There are numerous contextual phonetic variations that can occur to basic Arabic sounds. Among the most prominent phonetic variations are those that are caused by the influences of pharyngealization, nasalization, anticipatory coarticulation, aspiration, and segment duration changes. The most important phonetic variations of Arabic sounds are caused by pharyngealization. Pharyngealization is both intrasyllablic and intersyllble phenomenon. It affects all Arabic phonemes but its effects are most prominent on the vowels (short and long), the two diphthongs (/aj/ and /aw/), the sonorant /l/, the trill /r/, and the counterparts (/t/, d/, /s/, and /z/) of the Arabic emphatics (/T/, /D/, /S/, and /Z/). The counterparts of the empahatics assimilate to the corresponding emphatic whenever they occur in a context that causes them to become pharyngealized. A sound that is likely to become pharyngealized will do so when it is in the same syllable or next to an emphatic or another sound that is heavily pharyngealized. Phonetic variations caused by nasalization affect vowels and diphthongs. All Arabic vowels and the two diphthongs are nasalized when they are followed by a nasal sound (/m/ or /n/). For diphthongs nasalization is an intersyllable process (because diphthongs always occur as syllable closing sounds). But for vowels, nasalization can be intersyllabic or intrasyllabic process. Phonetic variations caused by anticipatory coarticulation affect the voiceless stops /t/ and /k/. Either the regular places of articulation of these sounds change or their articulation overlaps with another neighboring vowel sound. The /t/ and /k/ can be coarticulated in anticipation of the following front or back vowel. In Arabic there is aspiration, or the production of an /h/ like sound, which is encountered following the release of the voiceless stops /t/ and /k/. The phoneme /t/ is aspirated when word final and /k/ is aspirated when followed by any vowel. Sound duration changes affect the Arabic consonants. A consonant can occur in initial, interrvocalic, or syllable closing position. In Arabic, an initial or intervocalic consonant is shorter than the same consonant when it occurs as a syllable-closing consonant. 2-The basic synthesis units Three types of synthesis units are defined, which are a consonant vowel cluster, CV, a vowel-consonant cluster, VC, and a stable portion of consonant, C. Arabic syllables are very regular and are characterized by two important facts; the nucleus of every Arabic syllable is a vowel and the juncture between two closed or a closed followed by an open or closed syllables is always a point of minimal acoustic activity. Because of these two facts, it is possible to acoustically break the syllables into such clusters and to use them to synthesize Arabic speech. Besides, syllables are known to be basic units to carry information regarding the prosody of the speech. The synthesis process is following: a syllable of type CV or CV: is synthesized from a CV unit. A syllable of type CVC or CV:C is synthesized from a CV plus a VC unit. The uncommon syllables of type CVCC or CV:CC are synthesized from a CV, a VC and a lone C synthesis unit. The difference between this synthesis method and diphone synthesis is in the treatment of words of composite syllabic structure (words that are formed of closed syllables or a mixture of closed followed by closed or open syllables). The regularity of Arabic syllables and minimal acoustic activities across the juncture points between constituent syllables of such words ! " ISCA Archive

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

Borrowing the Verb “ast” and Its Varieties in Arabic Dialect of Sarab

“Borrowing” is a lingual process that is studied in diachronic linguistics. In this process a language borrows elements from another language. This process usually occurs in areas that two languages make contact with each other. In a dialect spoken in South Khorasan the language borrowing happens. Arabs living in this part of Iran probably have immigrated in the early centuries of Islam. In thi...

متن کامل

A New Intelligent Methodology for Computer based Assessment of Short Answer Question based on a New Enhanced Soundex Phonetic Algorithm for Arabic Language

Today most e-tests that created using the commercial tools for etest generation or the Learning Management Systems (LMSs) such as Moodle or others don't provide a methodology for a perfect assessment of short answer questions. Unfortunately all of them provide a binary assessment that can be 1 (for completely True) or 0 (for completely false) even if the answer is partially true or partially fa...

متن کامل

Phonetic tool for the Tunisian Arabic

A phonetic dictionary is an essential component of a speech recognition system or a speech synthesis system. Our work targets the generation of an automatic pronunciation dictionary for the Tunisian Arabic, in particular in the field of rail transport. To do this, we created two tools of phonetic vowelized and unvowelized words in the Tunisian Arabic. The proposed method to automatically genera...

متن کامل

Phonetic Question Generation Using Misrecognition

Most automatic speech recognition systems are currently based on tied state triphones. These tied states are usually determined by a decision tree. Decision trees can automatically cluster triphone states into many classes according to data available allowing each class to be trained efficiently. In order to achieve higher accuracy, this clustering is constrained by manually generated phonetic ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000